Session Segmentation Based on Document Metadata
نویسنده
چکیده
It has been shown that the search personalization can greatly benefit from exploiting user’s short-term context – his immediate needs and focus. But to achieve that, we need to be able to tell when the context changes; we need to be able to divide the user’s activity into segments, where each segment captures user’s single goal and focus. Many different approaches exist, but their major weakness is that they build inaccurate models that do not include user’s implicit feedback. We present a method for segmenting queries into search sessions which is based on document metadata and incorporates implicit feedback and as such is able to build more accurate context model.
منابع مشابه
Document Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملDetecting Search Sessions Using Document Metadata and Implicit Feedback
It has been shown that search personalization can greatly benefit from exploiting user’s short-term context – user’s immediate need and intent. However, this requires that the search engine must be able to divide user’s activity into segments, where each segment captures user’s single goal and focus. Several different approaches to search session segmentation exist, each considering different f...
متن کاملMetadata extration and text categorization using Universal Resource Locator expansions
Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are often human-readable and can indicate metadata about a resource. This paper explores the mining of URLs to yield categoric metadata about web resources via a three-phase pipeline of word segmentation, abbreviation expansion and classification. I apply this approach to the problem of subject metadat...
متن کاملMetadata extraction and text categorization using Universal Resource Locator expansions
Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are often human-readable and can indicate metadata about a resource. This paper explores the mining of URLs to yield categoric metadata about web resources via a three-phase pipeline of word segmentation, abbreviation expansion and classification. I apply this approach to the problem of subject metadat...
متن کاملPersian Printed Document Analysis and Page Segmentation
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...
متن کامل